2025.10.30 | 看图写码7B逆袭;视频思维RL破局
Description
本期的 15 篇论文如下:
[00:22 ] 👁 JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence(JanusCoder:面向代码智能的基础视觉-编程接口)
[01:00 ] 🧠 Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning(Video-Thinker:用强化学习点燃“视频思维”)
[01:55 ] 🔄 ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization(ReForm:基于前瞻性有界序列优化的反思式自动化形式化)
[02:42 ] 🔄 Scaling Latent Reasoning via Looped Language Models(通过循环语言模型扩展潜在推理能力)
[03:22 ] 🧠 Reasoning-Aware GRPO using Process Mining(基于过程挖掘的推理感知GRPO方法)
[03:52 ] 🎬 VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning(VFXMaster:通过上下文学习解锁动态视觉特效生成)
[04:33 ] 🏆 The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution(工具十项全能:面向多样、真实、长周期任务的语言智能体基准测试)
[05:11 ] 🖼 RegionE: Adaptive Region-Aware Generation for Efficient Image Editing(RegionE:面向高效图像编辑的自适应区域感知生成)
[06:22 ] 🎮 ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks(ChronoPlay:面向游戏RAG评测的双动态与真实性建模框架)
[06:58 ] 🧭 Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks(大模型时代的多模态空间推理:综述与基准)
[07:44 ] 🔗 PairUni: Pairwise Training for Unified Multimodal Language Models(PairUni:面向统一多模态语言模型的成对训练)
[08:33 ] ⚡ Parallel Loop Transformer for Efficient Test-Time Computation Scaling(并行循环Transformer:零延迟的测试时计算扩展)
[09:08 ] 🚗 Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks(重新审视驾驶世界模型:面向感知任务的合成数据生成器)
[09:55 ] 🧬 ODesign: A World Model for Biomolecular Interaction Design(ODesign:面向生物分子相互作用设计的全原子生成式世界模型)
[10:31 ] 🧬 Evolving Diagnostic Agents in a Virtual Clinical Environment(虚拟临床环境中进化诊断智能体)
<figure>
</figure>【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递







